21 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
English Hindi Adivasi Oriya Bangla Goan Konkani Gujarati Malayalam Marathi Old Tamil Punjabi Telugu Urd
Availability:
From Owner
License:
<Not Specified>
Size:
<Not Specified> <Not Specified>Production Status:
Newly created-in progress
Use:
Machine Translation, Discourse, Sentiment Analysis
Paper:
N/A
Documentation:
None
Written
Web page classification,
Language Type:
Bilingual
Languages:
Hindi Telugu
Availability:
From Owner
License:
<Not Specified>
Size:
3000 OtherProduction Status:
Newly created-in progress
Use:
Text Mining
Paper:
N/A
Documentation:
None
Speech
Text-to-Speech Synthesizer,
Language Type:
Multilingual
Languages:
American English English German Telugu Turkish
Availability:
Freely Available
License:
MByte
Size:
41 Production Status:
Existing-updated
Use:
Speech Synthesis
Paper:
N/A
Documentation:
<Not Specified>Language Type:
Trilingual
Languages:
Bangali Hindi Telugu
Availability:
From Owner
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Newly created-finished
Use:
Transliteration test bench
Paper:
N/A
Documentation:
<Not Specified>
Written
Corpus,
Language Type:
Multilingual
Languages:
Bengali Malayalam Tamil Telugu Urdu
Availability:
Freely Available
License:
Size:
461 MByte Production Status:
Existing-updated
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Neural Machine Translation for Low-Resourced Indian Languages
-
Paper track:Multimodality/poster presentation
-
Paper status:Accept Poster+DemoSuggested
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Himanshu Choudhary | Indian-Language-Dataset | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Bengali Gujarati Hindi Kannada Malayalam Marathi Punjabi Sindhi Sinhala Tamil Telugu Urdu
Availability:
Freely Available
License:
CreativeCommons
Size:
2 GByte Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Brian Roark | Dakshina dataset | /N |
Documentation:
None
Written
Treebank,
Language Type:
Monolingual
Languages:
Afrikaans Akkadian Amharic Ancient Greek Arabic Armenian Assyrian Bambara Basque Belarusian Bhojpuri Breton Bulgarian Buryat Cantonese Catalan Chinese Classical Chinese Coptic Croatian Czech Danish Dutch English Erzya Estonian Faroese Finnish French Galician German Gothic Greek Hebrew Hindi Hindi English Hungarian Indonesian Irish Italian Japanese Karelian Kazakh Komi Permyak Komi Zyrian Korean Kurmanji Latin Latvian Lithuanian Livvi Maltese Marathi Mbya Guarani Moksha Naija North Sami Norwegian Old Church Slavonic Old French Old Russian Persian Polish Portuguese Romanian Russian Sanskrit Scottish Gaelic Serbian Skolt Sami Slovak Slovenian Spanish Swedish Swedish Sign Language Swiss German Tagalog Tamil Telugu Thai Turkish Ukrainian Upper Sorbian Urdu Uyghur Vietnamese Warlpiri Welsh Wolof Yoruba
Availability:
Freely Available
License:
Various
Size:
25 million words Production Status:
Existing-updated
Use:
Parsing and Tagging
-
Paper title:Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joakim Nivre | Universal Dependencies | /N |
Documentation:
https://universaldependencies.org
Written
Corpus,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
<Not Specified>
Size:
22.10G tokens Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | OpenSubtitles2018 | /N |
Documentation:
Yes, on the website.
Written
Lexicon,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
CreativeCommons Attribution 4.0 International
Size:
41 GByte Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | word2word | /N |
Documentation:
Yes, on the website.
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
Telugu
Availability:
Freely Available
License:
Creative Commons (CC BY-SA 4.0)
Size:
1.1 GByte Production Status:
Newly created-finished
Use:
Speech Synthesis
-
Paper title:Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems
-
Paper track:Speech/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alexander Gutkin | Crowd-sourced high-quality Telugu multi-speaker speech data set by Google | /N |
Documentation:
README file in English.




